facial landmark detection
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
Deep Structured Prediction for Facial Landmark Detection
Existing deep learning based facial landmark detection methods have achieved excellent performance. These methods, however, do not explicitly embed the structural dependencies among landmark points. They hence cannot preserve the geometric relationships between landmark points or generalize well to challenging conditions or unseen data. This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field. We demonstrate its superior performance to existing state-of-the-art techniques in facial landmark detection, especially a better generalization ability on challenging datasets that include large pose and occlusion.
Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer
Tong, Qiyi, Nocentini, Olivia, Lagomarsino, Marta, Cai, Kuanqi, Lorenzini, Marta, Ajoudani, Arash
Facial Landmark Detection (FLD) in thermal imagery is critical for applications in challenging lighting conditions, but it is hampered by the lack of rich visual cues. Conventional cross-modal solutions, like feature fusion or image translation from RGB data, are often computationally expensive or introduce structural artifacts, limiting their practical deployment. To address this, we propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), a novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer from model compression to create both accurate and efficient thermal FLD models. A central challenge during knowledge transfer is the profound modality gap between RGB and thermal data, where traditional unidirectional distillation fails to enforce semantic consistency across disparate feature spaces. To overcome this, we introduce Dual-Injected Knowledge Distillation (DIKD), a bidirectional mechanism designed specifically for this task. DIKD establishes a connection between modalities: it not only guides the thermal student with rich RGB features but also validates the student's learned representations by feeding them back into the frozen teacher's prediction head. This closed-loop supervision forces the student to learn modality-invariant features that are semantically aligned with the teacher, ensuring a robust and profound knowledge transfer. Experiments show that our approach sets a new state-of-the-art on public thermal FLD benchmarks, notably outperforming previous methods while drastically reducing computational overhead.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (3 more...)
Heatmap Regression without Soft-Argmax for Facial Landmark Detection
Yang, Chiao-An, Yeh, Raymond A.
Facial landmark detection is an important task in computer vision with numerous applications, such as head pose estimation, expression analysis, face swapping, etc. Heatmap regression-based methods have been widely used to achieve state-of-the-art results in this task. These methods involve computing the argmax over the heatmaps to predict a landmark. Since argmax is not differentiable, these methods use a differentiable approximation, Soft-argmax, to enable end-to-end training on deep-nets. In this work, we revisit this long-standing choice of using Soft-argmax and demonstrate that it is not the only way to achieve strong performance. Instead, we propose an alternative training objective based on the classic structured prediction framework. Empirically, our method achieves state-of-the-art performance on three facial landmark benchmarks (WFLW, COFW, and 300W), converging 2.2x faster during training while maintaining better/competitive accuracy. Our code is available here: https://github.com/ca-joe-yang/regression-without-softarg.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.68)
Evaluation of facial landmark localization performance in a surgical setting
Frajtag, Ines, Švaco, Marko, Šuligoj, Filip
The use of robotics, computer vision, and their applications is becoming increasingly widespread in various fields, including medicine. Many face detection algorithms have found applications in neurosurgery, ophthalmology, and plastic surgery. A common challenge in using these algorithms is variable lighting conditions and the flexibility of detection positions to identify and precisely localize patients. The proposed experiment tests the MediaPipe algorithm for detecting facial landmarks in a controlled setting, using a robotic arm that automatically adjusts positions while the surgical light and the phantom remain in a fixed position. The results of this study demonstrate that the improved accuracy of facial landmark detection under surgical lighting significantly enhances the detection performance at larger yaw and pitch angles. The increase in standard deviation/dispersion occurs due to imprecise detection of selected facial landmarks. This analysis allows for a discussion on the potential integration of the MediaPipe algorithm into medical procedures.
- Europe > Croatia > Zagreb County > Zagreb (0.05)
- Asia > Middle East > Iraq > Nineveh Governorate > Mosul (0.04)
- Asia > India (0.04)
- (2 more...)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.49)
Reviews: Deep Structured Prediction for Facial Landmark Detection
The integration of convnets with the conditional random fields to model the structural dependencies of facial landmarks during face alignment is nice contribution. Previously proposed methods in this direction were hybrid systems (eg. OpenFace versions) and not fully integrated. The authors evaluate on multiple datasets (300W, 300W-Video, Menpo & COFW-68) and compare results with other methods. Both inter- and cross-dataset performance are provided.